Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique

نویسندگان

  • P. Nagabhushan
  • Alireza Alaei
چکیده

In this research work, we propose to identify an imaginary line called baseline threading through the entire stretch of text-line, with reference to which the location of vertical extents of Persian characters could be accurately interpreted. Depending upon the curvedness of the handwritten Persian text-line the baseline also would be curved. In this research a novel piece-wise painting scheme is proposed to prepare patches of black and white blocks all along the textline, identify some candidate points, regress a curve through these candidate points to trace the baseline which is subsequently stretched straight horizontally and subsequently we de-tilt the characters to align the text-line with the horizontal imaginary baseline properly. The proposed algorithm is evaluated with 108 Persian handwritten text-lines containing 3612 subwords. Experimental analysis showed that 91.2% of the subwords are accurately aligned. Further, the proposed scheme is tested with another dataset containing 600 text-lines [13] and more accurate results are achieved when compared with the results reported in state of the art for the same dataset. The effectiveness of aligning text-lines linearly is demonstrated through OCRing for readability of tilted printed English text-lines and corresponding transformed text-lines, which are obtained using the proposed procedure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new scheme for unconstrained handwritten text-line segmentation

Variations in inter-line gaps and skewed or curled text-lines are some of the challenging issues in segmentation of handwritten text-lines. Moreover, overlapping and touching text-lines that frequently appear in unconstrained handwritten text documents significantly increase segmentation complexities. In this paper, we propose a novel approach for unconstrained handwritten text-line segmentatio...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Region growing based segmentation algorithm for typewritten and handwritten text recognition

This paper presents a new technique of high accuracy to recognize both typewritten and handwritten English and Arabic texts without thinning. After segmenting the text into lines (horizontal segmentation) and the lines into words, it separates the word into its letters. Separating a text line (row) into words and a word into letters is performed by using the region growing technique (implicit s...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010